{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!--NAVIGATION-->\n",
    "< [理解Python中的数据类型](02.01-Understanding-Data-Types.ipynb) | [目录](Index.ipynb) | [使用Numpy计算：通用函数](02.03-Computation-on-arrays-ufuncs.ipynb) >\n",
    "\n",
    "<a href=\"https://colab.research.google.com/github/wangyingsm/Python-Data-Science-Handbook/blob/master/notebooks/02.02-The-Basics-Of-NumPy-Arrays.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open and Execute in Google Colaboratory\"></a>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# The Basics of NumPy Arrays\n",
    "\n",
    "# NumPy数组基础"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> Data manipulation in Python is nearly synonymous with NumPy array manipulation: even newer tools like Pandas ([Chapter 3](03.00-Introduction-to-Pandas.ipynb)) are built around the NumPy array.\n",
    "This section will present several examples of using NumPy array manipulation to access data and subarrays, and to split, reshape, and join the arrays.\n",
    "While the types of operations shown here may seem a bit dry and pedantic, they comprise the building blocks of many other examples used throughout the book.\n",
    "Get to know them well!\n",
    "\n",
    "Python中的数据操作基本就是NumPy数组操作的同义词：一些新的工具像Pandas（[第三章](03.00-Introduction-to-Pandas.ipynb)）都是依赖于NumPy数组建立起来的。本节会展示使用NumPy数组操作和访问数据以及子数组的一些例子，包括切分、变形和组合。尽管这里展示的操作有些枯燥和学术化，但是它们是组成本书后面使用的例子的基础。你应该更好的掌握它们。\n",
    "\n",
    "> We'll cover a few categories of basic array manipulations here:\n",
    "\n",
    "> - *Attributes of arrays*: Determining the size, shape, memory consumption, and data types of arrays\n",
    "> - *Indexing of arrays*: Getting and setting the value of individual array elements\n",
    "> - *Slicing of arrays*: Getting and setting smaller subarrays within a larger array\n",
    "> - *Reshaping of arrays*: Changing the shape of a given array\n",
    "> - *Joining and splitting of arrays*: Combining multiple arrays into one, and splitting one array into many\n",
    "\n",
    "我们会讨论下述数组操作的基本内容：\n",
    "\n",
    "- *数组的属性*: 获得数组的大小、形状、内存占用以及数据类型\n",
    "- *数组索引*: 获得和设置单个数组元素的值\n",
    "- *数组切片*: 获得和设置数组中的子数组\n",
    "- *数组变形*: 改变数组的形状\n",
    "- *组合和切分数组*: 将多个数组组合成一个，或者将一个数组切分成多个"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## NumPy Array Attributes\n",
    "\n",
    "## NumPy数组属性"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> First let's discuss some useful array attributes.\n",
    "We'll start by defining three random arrays, a one-dimensional, two-dimensional, and three-dimensional array.\n",
    "We'll use NumPy's random number generator, which we will *seed* with a set value in order to ensure that the same random arrays are generated each time this code is run:\n",
    "\n",
    "首先我们来讨论一些数组有用的属性。我们从定义三个数组开始，一个一维的，一个二维的和一个三维的数组。我们采用NumPy的随机数产生器来创建数组，产生之前我们会给定一个随机种子，这样来保证每次代码运行的时候都能得到相同的数组："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "np.random.seed(0)  # 设定随机种子，保证实验的可重现\n",
    "\n",
    "x1 = np.random.randint(10, size=6)  # 一维数组\n",
    "x2 = np.random.randint(10, size=(3, 4))  # 二维数组\n",
    "x3 = np.random.randint(10, size=(3, 4, 5))  # 三维数组"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> Each array has attributes ``ndim`` (the number of dimensions), ``shape`` (the size of each dimension), and ``size`` (the total size of the array):\n",
    "\n",
    "每个数组都有属性`ndim`，代表数组的维度，`shape`代表每个维度的长度（形状）和`size`代表数组的总长度（元素个数）"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "x3 ndim:  3\n",
      "x3 shape: (3, 4, 5)\n",
      "x3 size:  60\n"
     ]
    }
   ],
   "source": [
    "# 输出三维数组的维度、形状和总长度\n",
    "print(\"x3 ndim: \", x3.ndim)\n",
    "print(\"x3 shape:\", x3.shape)\n",
    "print(\"x3 size: \", x3.size)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> Another useful attribute is the ``dtype``, the data type of the array (which we discussed previously in [Understanding Data Types in Python](02.01-Understanding-Data-Types.ipynb)):\n",
    "\n",
    "另一个有用的属性是`dtype`，数组的数据类型（我们在上一节[理解Python的数据类型](02.01-Understanding-Data-Types.ipynb)中已经见过）。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "dtype: int64\n"
     ]
    }
   ],
   "source": [
    "print(\"dtype:\", x3.dtype)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> Other attributes include ``itemsize``, which lists the size (in bytes) of each array element, and ``nbytes``, which lists the total size (in bytes) of the array:\n",
    "\n",
    "还有属性包括`itemsize`代表每个数组元素的长度（单位字节），`nbytes`代表数组的总字节长度："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "itemsize: 8 bytes\n",
      "nbytes: 480 bytes\n"
     ]
    }
   ],
   "source": [
    "print(\"itemsize:\", x3.itemsize, \"bytes\")\n",
    "print(\"nbytes:\", x3.nbytes, \"bytes\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> In general, we expect that ``nbytes`` is equal to ``itemsize`` times ``size``.\n",
    "\n",
    "通常，我们可以认为`nbytes`等于`itemsize`乘以`size`。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Array Indexing: Accessing Single Elements\n",
    "\n",
    "## 数组索引：获取单个元素"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> If you are familiar with Python's standard list indexing, indexing in NumPy will feel quite familiar.\n",
    "In a one-dimensional array, the $i^{th}$ value (counting from zero) can be accessed by specifying the desired index in square brackets, just as with Python lists:\n",
    "\n",
    "如果我们熟悉Python列表的索引方式，那么NumPy数组的索引方式也是很相似的。对于一维数组来说，第i个元素值（从0开始）可以使用中括号内的索引值获得："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([5, 0, 3, 3, 7, 9])"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "5"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x1[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "7"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x1[4]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> To index from the end of the array, you can use negative indices:\n",
    "\n",
    "需要从末尾进行索引取值，你可以使用负的索引值："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "9"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x1[-1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "7"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x1[-2]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> In a multi-dimensional array, items can be accessed using a comma-separated tuple of indices:\n",
    "\n",
    "在多维数组中获取元素值，可以在中括号中使用一个索引值的元组：\n",
    "\n",
    "译者注：多维数组的索引方式与列表的列表索引方式是不同的。列表的列表在Python中需要使用多个中括号进行索引，如`x[i][j]`的方式。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[3, 5, 2, 4],\n",
       "       [7, 6, 8, 8],\n",
       "       [1, 6, 7, 7]])"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x2[0, 0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x2[2, 0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "7"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x2[2, -1]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> Values can also be modified using any of the above index notation:\n",
    "\n",
    "元素值也可以通过上述的索引语法进行修改："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[12,  5,  2,  4],\n",
       "       [ 7,  6,  8,  8],\n",
       "       [ 1,  6,  7,  7]])"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x2[0, 0] = 12\n",
    "x2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> Keep in mind that, unlike Python lists, NumPy arrays have a fixed type.\n",
    "This means, for example, that if you attempt to insert a floating-point value to an integer array, the value will be silently truncated. Don't be caught unaware by this behavior!\n",
    "\n",
    "请记住，与Python的列表不同，NumPy数组是固定类型的。这意味着，如果你试图将一个浮点数值放入一个整数型数组，这个值会被默默地截成整数。这是比较容易犯的错误。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([3, 0, 3, 3, 7, 9])"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x1[0] = 3.14159  # 会被截成整数\n",
    "x1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Array Slicing: Accessing Subarrays\n",
    "\n",
    "## 数组切片：获取子数组"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> Just as we can use square brackets to access individual array elements, we can also use them to access subarrays with the *slice* notation, marked by the colon (``:``) character.\n",
    "The NumPy slicing syntax follows that of the standard Python list; to access a slice of an array ``x``, use this:\n",
    "\n",
    "``` python\n",
    "x[start:stop:step]\n",
    "```\n",
    "\n",
    "> If any of these are unspecified, they default to the values ``start=0``, ``stop=``*``size of dimension``*, ``step=1``.\n",
    "We'll take a look at accessing sub-arrays in one dimension and in multiple dimensions.\n",
    "\n",
    "正如我们可以使用中括号获取单个元素值，我们也可以使用中括号的*切片*语法获取子数组，切片的语法遵从标准Python列表的切片语法格式；对于一个数组`x`进行切片：\n",
    "\n",
    "```python\n",
    "x[start:stop:step]\n",
    "```\n",
    "\n",
    "如果三个参数没有设置值的话，默认值分别是`start=0`，`stop=`*`维度的长度`*，`step=1`。我们来看看在一维数组和多维数组中进行切片取子数组的例子。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### One-dimensional subarrays\n",
    "\n",
    "### 一维子数组"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x = np.arange(10)\n",
    "x"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([0, 1, 2, 3, 4])"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x[:5]  # 前五个元素"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([5, 6, 7, 8, 9])"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x[5:]  # 从序号5开始的所有元素"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([4, 5, 6])"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x[4:7]  # 中间4~6序号的元素"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([0, 2, 4, 6, 8])"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x[::2]  # 每隔一个取元素"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([1, 3, 5, 7, 9])"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x[1::2]  # 每隔一个取元素，开始序号为1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> A potentially confusing case is when the ``step`` value is negative.\n",
    "In this case, the defaults for ``start`` and ``stop`` are swapped.\n",
    "This becomes a convenient way to reverse an array:\n",
    "\n",
    "当step为负值时，将会在数组里反向的取元素，这是将数组反向排序最简单的方法：\n",
    "\n",
    "译者注，从其他编程语言转Python的初学者，很容易问一个问题，我想反序一个字符串，怎么找不到函数啊，內建的没有，str的方法也没有。答案是，因为根本不需要，例如：\n",
    "\n",
    "```python\n",
    "s = 'hello world'\n",
    "# 下面就会输出'dlrow olleh'\n",
    "print(s[::-1])\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x[::-1]  # 反序数组"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([5, 3, 1])"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x[5::-2]  # 从序号5开始向前取元素，每隔一个取一个元素"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Multi-dimensional subarrays\n",
    "\n",
    "### 多维子数组\n",
    "\n",
    "> Multi-dimensional slices work in the same way, with multiple slices separated by commas.\n",
    "For example:\n",
    "\n",
    "多维数组的切片也一样，只是在中括号中使用逗号分隔多个切片声明。例如："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[12,  5,  2,  4],\n",
       "       [ 7,  6,  8,  8],\n",
       "       [ 1,  6,  7,  7]])"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[12,  5,  2],\n",
       "       [ 7,  6,  8]])"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x2[:2, :3]  # 行的维度取前两个，列的维度取前三个，形状变为(2, 3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[12,  2],\n",
       "       [ 7,  8],\n",
       "       [ 1,  7]])"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x2[:3, ::2]  # 行的维度取前三个（全部），列的维度每个一个取一列，形状变为(3, 2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> Finally, subarray dimensions can even be reversed together:\n",
    "\n",
    "最后，子数组的各维度还可以反序："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 7,  7,  6,  1],\n",
       "       [ 8,  8,  6,  7],\n",
       "       [ 4,  2,  5, 12]])"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x2[::-1, ::-1] # 行和列都反序，形状保持(3, 4)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Accessing array rows and columns\n",
    "\n",
    "#### 获取数组的行和列\n",
    "\n",
    "> One commonly needed routine is accessing of single rows or columns of an array.\n",
    "This can be done by combining indexing and slicing, using an empty slice marked by a single colon (``:``):\n",
    "\n",
    "还有一种常见的需要是获取数组的单行或者单列。这可以通过组合索引和切片两个操作做到，使用一个不带参数的冒号`:`可以表示取该维度的所有元素："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[12  7  1]\n"
     ]
    }
   ],
   "source": [
    "print(x2[:, 0])  # x2的第一列"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[12  5  2  4]\n"
     ]
    }
   ],
   "source": [
    "print(x2[0, :])  # x2的第一行"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> In the case of row access, the empty slice can be omitted for a more compact syntax:\n",
    "\n",
    "如果是获取行数据的话，可以省略后续的切片，写成更加简洁的方式："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[12  5  2  4]\n"
     ]
    }
   ],
   "source": [
    "print(x2[0])  # 等同于 x2[0, :]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Subarrays as no-copy views\n",
    "\n",
    "### 子数组是非副本的视图\n",
    "\n",
    "> One important–and extremely useful–thing to know about array slices is that they return *views* rather than *copies* of the array data.\n",
    "This is one area in which NumPy array slicing differs from Python list slicing: in lists, slices will be copies.\n",
    "Consider our two-dimensional array from before:\n",
    "\n",
    "一个非常重要和有用的概念你需要知道的就是数组的切片返回的实际上是子数组的*视图*而不是它们的副本。这是NumPy数组的切片和Python列表的切片的主要区别，列表的切片返回的是副本。用上面的二维做例子："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[12  5  2  4]\n",
      " [ 7  6  8  8]\n",
      " [ 1  6  7  7]]\n"
     ]
    }
   ],
   "source": [
    "print(x2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> Let's extract a $2 \\times 2$ subarray from this:\n",
    "\n",
    "让我们从中取一个$2 \\times 2$的子数组："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[12  5]\n",
      " [ 7  6]]\n"
     ]
    }
   ],
   "source": [
    "x2_sub = x2[:2, :2]\n",
    "print(x2_sub)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> Now if we modify this subarray, we'll see that the original array is changed! Observe:\n",
    "\n",
    "如果我们修改这个子数组，我们看到原来的数组也会随之更改："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[99  5]\n",
      " [ 7  6]]\n"
     ]
    }
   ],
   "source": [
    "x2_sub[0, 0] = 99\n",
    "print(x2_sub)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[99  5  2  4]\n",
      " [ 7  6  8  8]\n",
      " [ 1  6  7  7]]\n"
     ]
    }
   ],
   "source": [
    "print(x2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> This default behavior is actually quite useful: it means that when we work with large datasets, we can access and process pieces of these datasets without the need to copy the underlying data buffer.\n",
    "\n",
    "这个默认行为是很有用的：这意味着当我们在处理大数据集时，我们可以获取和处理其中的部分子数据集而不需要在内存中复制一份数据的副本。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Creating copies of arrays\n",
    "\n",
    "### 创建数组的副本\n",
    "\n",
    "> Despite the nice features of array views, it is sometimes useful to instead explicitly copy the data within an array or a subarray. This can be most easily done with the ``copy()`` method:\n",
    "\n",
    "尽管使用视图有上述的优点，有时候我们还是需要从数组中复制一份子数组出来。这可以使用`copy`方法简单的办到："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[99  5]\n",
      " [ 7  6]]\n"
     ]
    }
   ],
   "source": [
    "x2_sub_copy = x2[:2, :2].copy()\n",
    "print(x2_sub_copy)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> If we now modify this subarray, the original array is not touched:\n",
    "\n",
    "现在如果我们改变这个子数组，原数组会保持不变："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[42  5]\n",
      " [ 7  6]]\n"
     ]
    }
   ],
   "source": [
    "x2_sub_copy[0, 0] = 42\n",
    "print(x2_sub_copy)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[99  5  2  4]\n",
      " [ 7  6  8  8]\n",
      " [ 1  6  7  7]]\n"
     ]
    }
   ],
   "source": [
    "print(x2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Reshaping of Arrays\n",
    "\n",
    "## 改变数组的形状\n",
    "\n",
    "> Another useful type of operation is reshaping of arrays.\n",
    "The most flexible way of doing this is with the ``reshape`` method.\n",
    "For example, if you want to put the numbers 1 through 9 in a $3 \\times 3$ grid, you can do the following:\n",
    "\n",
    "另一个数组的常用操作是改变形状。最方便的方式是使用`reshape`方法实现。例如，如果你希望将1~9的数放入一个$3 \\times 3$的数组里面，你可以这样做："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[1 2 3]\n",
      " [4 5 6]\n",
      " [7 8 9]]\n"
     ]
    }
   ],
   "source": [
    "grid = np.arange(1, 10).reshape((3, 3))\n",
    "print(grid)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> Note that for this to work, the size of the initial array must match the size of the reshaped array. \n",
    "Where possible, the ``reshape`` method will use a no-copy view of the initial array, but with non-contiguous memory buffers this is not always the case.\n",
    "\n",
    "注意，改变形状要能成功，原始数组和新的形状的数组的总长度`size`必须一样。当可能的情况下，`reshape`会尽量使用原始数组的视图，但是如果原始数组的数据存储在不连续的内存区，就会进行复制。\n",
    "\n",
    "> Another common reshaping pattern is the conversion of a one-dimensional array into a two-dimensional row or column matrix.\n",
    "This can be done with the ``reshape`` method, or more easily done by making use of the ``newaxis`` keyword within a slice operation:\n",
    "\n",
    "另外一个常用的改变形状的操作就是将一个一维数组变成二维数组中的一行或者一列。这也可以使用`reshape`方法实现，或者更简单的方式是使用切片语法中的`newaxis`属性增加一个维度："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[1, 2, 3]])"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x = np.array([1, 2, 3])\n",
    "\n",
    "# 使用reshape变为 (1, 3)\n",
    "x.reshape((1, 3))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[1, 2, 3]])"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 使用newaxis，增加行维度，形状也是 (1, 3)\n",
    "x[np.newaxis, :]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[1],\n",
       "       [2],\n",
       "       [3]])"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 使用reshape变为 (3, 1)\n",
    "x.reshape((3, 1))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[1],\n",
       "       [2],\n",
       "       [3]])"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 使用newaxis增加列维度，形状也是 (3, 1)\n",
    "x[:, np.newaxis]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> We will see this type of transformation often throughout the remainder of the book.\n",
    "\n",
    "我们会在本书后续的内容经常看到这样的变换。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Array Concatenation and Splitting\n",
    "\n",
    "## 数组的连接和切分\n",
    "\n",
    "> All of the preceding routines worked on single arrays. It's also possible to combine multiple arrays into one, and to conversely split a single array into multiple arrays. We'll take a look at those operations here.\n",
    "\n",
    "前面的方法都是在单个数组上进行操作。我们也可以将多个数组组成一个，或者反过来将一个数组切分成多个。下面我们来看看这些操作。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Concatenation of arrays\n",
    "\n",
    "### 连接数组\n",
    "\n",
    "> Concatenation, or joining of two arrays in NumPy, is primarily accomplished using the routines ``np.concatenate``, ``np.vstack``, and ``np.hstack``.\n",
    "``np.concatenate`` takes a tuple or list of arrays as its first argument, as we can see here:\n",
    "\n",
    "在NumPy中连接或者组合多个数组，有三个不同的方法`np.concatenate`，`np.vstack`和`np.hstack`。`np.concatenate`接受一个数组的元组或列表作为第一个参数，如下："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([1, 2, 3, 3, 2, 1])"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x = np.array([1, 2, 3])\n",
    "y = np.array([3, 2, 1])\n",
    "np.concatenate([x, y])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> You can also concatenate more than two arrays at once:\n",
    "\n",
    "你也可以一次连接两个以上的数组："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[ 1  2  3  3  2  1 99 99 99]\n"
     ]
    }
   ],
   "source": [
    "z = [99, 99, 99]\n",
    "print(np.concatenate([x, y, z]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> It can also be used for two-dimensional arrays:\n",
    "\n",
    "也可以用来连接二维数组："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [],
   "source": [
    "grid = np.array([[1, 2, 3],\n",
    "                 [4, 5, 6]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[1, 2, 3],\n",
       "       [4, 5, 6],\n",
       "       [1, 2, 3],\n",
       "       [4, 5, 6]])"
      ]
     },
     "execution_count": 47,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 沿着第一个维度进行连接，即按照行连接，axis=0\n",
    "np.concatenate([grid, grid])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[1, 2, 3, 1, 2, 3],\n",
       "       [4, 5, 6, 4, 5, 6]])"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 沿着第二个维度进行连接，即按照列连接，axis=1\n",
    "np.concatenate([grid, grid], axis=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> For working with arrays of mixed dimensions, it can be clearer to use the ``np.vstack`` (vertical stack) and ``np.hstack`` (horizontal stack) functions:\n",
    "\n",
    "进行连接的数组如果具有不同的维度，使用`np.vstack`（垂直堆叠）和`np.hstack`（水平堆叠）会更加清晰："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[1, 2, 3],\n",
       "       [9, 8, 7],\n",
       "       [6, 5, 4]])"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x = np.array([1, 2, 3])\n",
    "grid = np.array([[9, 8, 7],\n",
    "                 [6, 5, 4]])\n",
    "\n",
    "# 沿着垂直方向进行堆叠\n",
    "np.vstack([x, grid])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 9,  8,  7, 99],\n",
       "       [ 6,  5,  4, 99]])"
      ]
     },
     "execution_count": 50,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 沿着水平方向进行堆叠\n",
    "y = np.array([[99],\n",
    "              [99]])\n",
    "np.hstack([grid, y])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> Similary, ``np.dstack`` will stack arrays along the third axis.\n",
    "\n",
    "类似的，`np.dstack`会沿着第三个维度（深度）进行堆叠。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Splitting of arrays\n",
    "\n",
    "### 切分数组\n",
    "\n",
    "> The opposite of concatenation is splitting, which is implemented by the functions ``np.split``, ``np.hsplit``, and ``np.vsplit``.  For each of these, we can pass a list of indices giving the split points:\n",
    "\n",
    "连接的反操作是切分，主要的方法包括`np.split`，`np.hsplit`和`np.vsplit`。我们可以传递一个列表参数表示切分的点："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1 2 3] [99 99] [3 2 1]\n"
     ]
    }
   ],
   "source": [
    "x = [1, 2, 3, 99, 99, 3, 2, 1]\n",
    "x1, x2, x3 = np.split(x, [3, 5]) # 在序号3和序号5处进行切分，返回三个数组\n",
    "print(x1, x2, x3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> Notice that *N* split-points, leads to *N + 1* subarrays.\n",
    "The related functions ``np.hsplit`` and ``np.vsplit`` are similar:\n",
    "\n",
    "你应该已经发现N个切分点会返回N+1个子数组。相应的`np.hsplit`和`np.vsplit`也是相似的："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 0,  1,  2,  3],\n",
       "       [ 4,  5,  6,  7],\n",
       "       [ 8,  9, 10, 11],\n",
       "       [12, 13, 14, 15]])"
      ]
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "grid = np.arange(16).reshape((4, 4))\n",
    "grid"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[0 1 2 3]\n",
      " [4 5 6 7]]\n",
      "[[ 8  9 10 11]\n",
      " [12 13 14 15]]\n"
     ]
    }
   ],
   "source": [
    "upper, lower = np.vsplit(grid, [2]) # 沿垂直方向切分，切分点行序号为2\n",
    "print(upper)\n",
    "print(lower)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 0  1]\n",
      " [ 4  5]\n",
      " [ 8  9]\n",
      " [12 13]]\n",
      "[[ 2  3]\n",
      " [ 6  7]\n",
      " [10 11]\n",
      " [14 15]]\n"
     ]
    }
   ],
   "source": [
    "left, right = np.hsplit(grid, [2]) # 沿水平方向切分数组，切分点列序号为2\n",
    "print(left)\n",
    "print(right)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> Similarly, ``np.dsplit`` will split arrays along the third axis.\n",
    "\n",
    "同样`np.dsplit`会沿着第三个维度切分数组。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!--NAVIGATION-->\n",
    "< [理解Python中的数据类型](02.01-Understanding-Data-Types.ipynb) | [目录](Index.ipynb) | [使用Numpy计算：通用函数](02.03-Computation-on-arrays-ufuncs.ipynb) >\n",
    "\n",
    "<a href=\"https://colab.research.google.com/github/wangyingsm/Python-Data-Science-Handbook/blob/master/notebooks/02.02-The-Basics-Of-NumPy-Arrays.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open and Execute in Google Colaboratory\"></a>\n"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
